Methods and systems for predictive engine evaluation and replay of engine performance

ABSTRACT

Disclosed are methods and systems of tracking the deployment of a predictive engine for machine learning, including steps to deploy an engine variant of the predictive engine based on an engine parameter set, wherein the engine parameter set identifies at least one data source and at least one algorithm; receive one or more queries to the deployed engine variant from one or more end-user devices, and in response, generate predicted results; receive one or more actual results corresponding to the predicted results; associate the queries, the predicted results, and the actual results with a replay tag, and record them with the corresponding deployed engine variant.

REFERENCE TO RELATED APPLICATIONS

This application is a continuation of and claims the benefit of priorityfrom U.S. Ser. No. 14/797,125, filed on Jul. 11, 2015, entitled “METHODSAND SYSTEMS FOR VISUAL REPLAY OF PREDICTIVE ENGINE PERFORMANCE,” whichis a continuation of U.S. Ser. No. 14/684,418, filed on Apr. 12, 2015,entitled “METHODS AND SYSTEMS FOR PREDICTIVE ENGINE EVALUATION, TUNING,AND REPLAY OF ENGINE PERFORMANCE,” which issued as U.S. Pat. No.9,135,559 on Sep. 15, 2015, and also is a non-provisional of and claimsthe benefit of provisional application having U.S. Ser. No. 62/136,311,filed on Mar. 20, 2015, and entitled “METHODS AND SYSTEMS FOR PREDICTIVEENGINE EVALUATION AND TUNING,” the entire disclosures of all of whichare hereby incorporated by reference in their entireties herein.

NOTICE OF COPYRIGHTS AND TRADEDRESS

A portion of the disclosure of this patent document contains materialwhich is subject to copyright protection. This patent document may showand/or describe matter which is or may become tradedress of the owner.The copyright and tradedress owner has no objection to the facsimilereproduction by anyone of the patent disclosure as it appears in theU.S. Patent and Trademark Office files or records, but otherwisereserves all copyright and tradedress rights whatsoever.

FIELD OF THE INVENTION

Embodiments of the present invention broadly relate to systems andmethods for building and deploying machine learning systems forpredictive analytics. More particularly, embodiments of the presentinvention relate to creating, evaluating, tuning predictive engines inproduction, and replaying the performance of predictive engines forpredictive engine design and analysis. A predictive engine includes oneor more predictive models that can be trained on collected data forpredicting future user behaviors, future events, or other desiredinformation. Such prediction results are useful in various businesssettings such as in marketing and sales. Embodiments of the presentinvention enable customization of engine components targeted forspecific business needs, allow systematic evaluation and tuning ofmultiple engines or engine variants, and provide ways of replayingengine performances during or after the evaluation and tuning processes.

BACKGROUND OF THE INVENTION

The statements in this section may serve as a background to helpunderstand the invention and its application and uses, but may notconstitute prior art.

Machine learning systems analyze data and establish models to makepredictions and decisions. Examples of machine learning tasks includeclassification, regression and clustering. A predictive engine is amachine learning system that typically includes a data processingframework and one or more algorithms trained and configured based oncollections of data. Such predictive engines are deployed to serveprediction results upon request. A simple example is a recommendationengine for suggesting a certain number of products to a customer basedon pricing, product availabilities, product similarities, current salesstrategy, and other factors. Such recommendations can also bepersonalized by taking into account user purchase history, browsinghistory, geographical location, or other user preferences or settings.Some existing tools used for building machine learning systems includeAPACHE SPARK MLLIB, MAHOUT, SCIKIT-LEARN, and R.

Recently, the advent of big data analytics has sparked more interest inthe design of machine learning systems and smart applications. However,even with the wide availability of processing frameworks, algorithmlibraries, and data storage systems, various issues exist in bringingmachine learning applications from prototyping into production. Inaddition to data integration and system scalability, real-timedeployment of predictive engines in a possibly distributed environmentrequires dynamic query responses, live model update with new data,inclusion of business logics, and most importantly, intelligent andpossibly live evaluation and tuning of predictive engines to update theunderlying predictive models or algorithms to generate new enginevariants. In addition, existing tools for building machine learningsystems often provide encapsulated solutions. Such encapsulations, whilefacilitating fast integration into deployment platforms and systems,make it difficult to identify causes for inaccurate prediction results.It is also difficult to extensively track sequences of events thattrigger particular prediction results.

Therefore, in view of the aforementioned difficulties, there is anunsolved need to make it easy and efficient for developers and datascientists to create, deploy, evaluate, and tune machine learningsystems.

It is against this background that various embodiments of the presentinvention were developed.

BRIEF SUMMARY OF THE INVENTION

The inventors of the present invention have created methods and systemsfor tracking the deployment of predictive engines for machine learningapplications, and for replaying the performances of such predictiveengines.

More specifically, in one aspect, one embodiment of the presentinvention is a method for tracking the deployment of a predictiveengine, the method including steps to deploy an engine variant of thepredictive engine based on an engine parameter set, wherein the engineparameter set identifies at least one data source and at least onealgorithm; the deployed engine variant listens to and receives one ormore queries from one or more end-user devices. In response to thereceived queries, the deployed engine variant generates one or morepredicted results. The method further includes steps to receive one ormore actual results corresponding to the predicted results, and toassociate the queries, the predicted results, and the actual resultswith a replay tag, and recording them with the corresponding deployedengine variant.

In some embodiments of the present invention, the method furtherincludes steps to receive a replay request specified by one or morereplay tags, and in response to the replay request, replay at least oneof the queries, the predicted results, and the actual results associatedwith the one or more replay tags.

In some embodiments of the present invention, the engine parameter setis generated manually by an operator. In other embodiments, the engineparameter set is determined automatically by the system using one ormore heuristics, rules, or other procedures. In yet other embodiments,the engine parameter set may be determined automatically, and lateredited or modified by the operator before the engine variant isdeployed.

In some embodiments, the actual results comprise a sequence of userresponses to the predicted results. In some embodiments, the actualresults are collected over a delayed time frame, or from one or morecohorts of users. In other embodiments, the actual results are receivedfrom a datastore. In other embodiments, the actual results aresimulated. In yet other embodiments, the actual results are correctvalues, actual events, user actions and/or subsequent end-userbehaviors, depending on the uses of the predictive engine.

In another aspect, the present invention is a non-transitory,computer-readable storage medium storing executable instructions, whichwhen executed by a processor, causes the processor to perform a processfor tracking a predictive engine for later replay of engine performance,the instructions causing the processor to perform the aforementionedsteps.

In another aspect, the present invention is a system for tracking apredictive engine for replay of engine performance, the systemcomprising a user device having a processor, a display, and a firstmemory; a server comprising a second memory and a data repository; atelecommunications-link between said user device and said server; and aplurality of computer codes embodied on said memory of said user-deviceand said server, said plurality of computer codes which when executedcauses said server and said user-device to execute a process comprisingthe aforementioned steps.

In yet another aspect, the present invention is a computerized servercomprising at least one processor, memory, and a plurality of computercodes embodied on said memory, said plurality of computer codes whichwhen executed causes said processor to execute a process comprising theaforementioned steps.

Yet other aspects of the present invention include the methods,processes, and algorithms comprising the steps described herein, andalso include the processes and modes of operation of the systems andservers described herein. Other aspects and embodiments of the presentinvention will become apparent from the detailed description of theinvention when read in conjunction with the attached drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention described herein are exemplary, andnot restrictive. Embodiments will now be described, by way of examples,with reference to the accompanying drawings, in which:

FIG. 1 is a network configuration diagram in which the present inventionmay be practiced.

FIG. 2A is a diagram showing a machine learning framework based on asingle predictive engine, according to one embodiment of the presentinvention.

FIG. 2B is a diagram showing a machine learning framework based onmultiple predictive engines, according to one embodiment of the presentinvention.

FIG. 3A is a diagram showing a machine learning framework and thecomponents of a predictive engine involved in training predictivemodels, according to one embodiment of the present invention.

FIG. 3B is a diagram showing a machine learning framework and thecomponents of a predictive engine involved in responding to dynamicqueries to the predictive engine, according to one embodiment of thepresent invention.

FIG. 4 is a diagram showing the structure of a predictive engine,according to one embodiment of the present invention.

FIG. 5A is a diagram showing a method of automatically tuning parametersof a predictive engine by evaluating a generated list of parameterssets, according to one embodiment of the present invention.

FIG. 5B is a flowchart showing a method of automatically tuningparameters of a predictive engine by evaluating a generated list ofparameters sets, according to one embodiment of the present invention.

FIG. 6A is a diagram showing a method of automatically tuning parametersof a predictive engine by evaluating iteratively generated parametersets, according to one embodiment of the present invention.

FIG. 6B is a flowchart showing a method of automatically tuningparameters of a predictive engine by evaluating iteratively generatedparameter sets, according to one embodiment of the present invention.

FIG. 7A is a diagram showing a method of automatically tuning parametersof a predictive engine by evaluating iteratively generated lists ofparameters sets, according to one embodiment of the present invention.

FIG. 7B is a flowchart showing a method of automatically tuningparameters of a predictive engine by evaluating iteratively generatedlists of parameters sets, according to one embodiment of the presentinvention.

FIG. 8 is an illustrative diagram showing the process of evaluating andtuning two variants of a predictive engine, according to one embodimentof the present invention.

FIG. 9 is an illustrative graph of actual user actions recorded over agiven time period, according to one embodiment of the present invention.

FIG. 10 is one illustrative plot showing how reports of predictionresults may be viewed graphically, according to one illustrativeembodiment of the invention.

FIG. 11 is another illustrative plot showing how reports of predictionresults may be viewed graphically, according to another illustrativeembodiment of the invention.

FIG. 12 shows an illustrative system diagram for testing multiple enginevariants at the same time, according to one embodiment of the presentinvention.

FIG. 13 shows an illustrative visual display of prediction performancesof a predictive engine over a replay group, according to one embodimentof the present invention.

FIG. 14 shows an illustrative visual display of prediction performancesover two replay groups, according to one embodiment of the presentinvention.

FIG. 15 shows an illustrative visual display of prediction performancesover a replay group created using query segment filters, according toone embodiment of the present invention.

FIG. 16 shows an illustrative histogram representing predictionperformances over two replay groups, according to one embodiment of thepresent invention.

FIG. 17 shows one illustrative visual display of prediction performancesover multiple replay groups, according to one embodiment of the presentinvention.

FIG. 18 shows another illustrative visual display of predictionperformances over multiple replay groups, according to one embodiment ofthe present invention.

FIG. 19 shows an illustrative visual display of prediction performancesover a replay group, with query records, according to one embodiment ofthe present invention.

FIG. 20 shows an illustrative visual display of prediction performancesover two replay groups, with query records, according to one embodimentof the present invention.

DETAILED DESCRIPTION OF THE INVENTION Definitions

Some illustrative definitions are provided to assist in understandingthe present invention, but these definitions are not to be read asrestricting the scope of the present invention. The terms may be used inthe form of nouns, verbs or adjectives, within the scope of thedefinitions.

-   -   “Prediction engine” and “predictive engine” refer to program        code components that are used to make predictions, for example,        of how a user might behave given certain inputs. The terms        “prediction” and “predictive” are used interchangeably in this        description.    -   “Data source” refers to a component of a predictive engine for        reading data from one or more source(s) of data storage, wherein        the data could be training data, test data, real data, live        data, historical data, simulated data, and so forth.    -   “Data preparator” refers to a component of a predictive engine        for automatic preprocessing of data from any data source,        possibly into a desired format. The data preparator prepares and        cleanses data according to what the predictive engine expects.    -   “Algorithm” refers to an algorithmic component of a predictive        engine for generating predictions and decisions. The Algorithm        component includes machine learning algorithms, as well as        settings of algorithm parameters that determine how a predictive        model is constructed. A predictive engine may include one or        more algorithms, to be used independently or in combination.        Parameters of a predictive engine specify which algorithms are        used, the algorithm parameters used in each algorithm, and how        the results of each algorithm are congregated or combined to        arrive at a prediction engine result, also known as an output or        prediction.    -   “Serving” component refers to a component of a predictive engine        for returning prediction results, and for adding custom business        logic. If an engine has multiple algorithms, the Serving        component may combine multiple prediction results into one.    -   “Evaluator” or “Evaluation” component refers to a component of a        predictive engine for evaluating the performance of the        prediction process to compare different algorithms as well as        different engine variants.    -   “DASE” is an acronym for Data (including Data source and Data        preparator), Algorithm (including algorithm parameters),        Serving, and Evaluation components, as defined above. All DASE        inputs are customizable.    -   “Engine variant”, “variant”, and “predictive engine variant”        refer to a deployable instance of a predictive engine, specified        by a given engine parameter set. An engine parameter set        includes parameters that control each component of a predictive        engine, including its Data Source, Data Preparator, Algorithm,        Serving, and/or Evaluator components.    -   “Query” and “Q” is a request from an end-user or end-user device        for information. For example, a recommendation for a product, a        recommended product and its associated price, or other data to        be served to the end-user. A query can be seen as an explicit or        implicit request for one or more predictive results.    -   “Predicted result”, “prediction result”, and “P” is a prediction        made by a prediction engine. For example, a predicted result        could be an end-user purchasing a given recommended product.    -   “Actual result” and “A” includes correct values, actual events,        as well as user actions or “subsequent end-user behaviors.”        Actual results can be correct values to predictive problems such        as classifications, actual outcomes or results of future events,        and/or any user actions or behaviors from the end-user device        specifying what the end-user has done in response to a        prediction result provided in response to a query, and so on.        Actual results include actual outcomes in the case of a        prediction engine predicting actual events. For example, if a        prediction engine is used to predict whether a tree will fall        down within 24 hours, the “actual result” will be the correct        value of whether that particular tree actually falls down within        the predicted time period. In addition, actual results also        include any subsequent end-user behaviors, including but not        limited to, purchasing the recommended product, clicking on        various locations on the end-user device, performing various        actions on the end-user application, and so forth. If P=A for a        given Q, then it is considered an excellent prediction. The        deviation of P from A can be used to define a metric of the        accuracy or correctness of a given prediction engine for a given        Q.    -   “End-user” or simply “user” are users of an end-user application        that is being implemented and tested using the prediction        engine. In one embodiment, the end-users are consumers who        utilize a consumer application that is employing a prediction        engine to serve recommendations to the end-user using the        consumer application.    -   “Operators” are system users who replay prediction scenarios        during evaluation. An operator uses a replay system or product,        and may be a developer of predictive engines. An operator, in        contrast to an ordinary end-user, may be a software developer, a        programmer, and/or a data scientist.    -   “Prediction Score” and “Prediction Score of a Query” is a value        that represents the prediction performance of a deployed engine        variant for a given query. A prediction score is calculated by        at least one pre-defined or operator-defined score function,        based on prediction result(s) and actual result(s) associated        with the query.    -   “Replay Groups” refer to segments of queries that may be created        with query segment filters, examples of which include engine        variant filter, user attribute filter, item attribute filter,        query attribute filter, and other conditional filters capable of        selecting a subset of available queries for performance analysis        and monitoring.

Overview

In the following description, for purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of the invention. It will be apparent, however, to oneskilled in the art that the invention can be practiced without thesespecific details. In other instances, structures, devices, activities,and methods are shown using schematics, use cases, and/or flow diagramsin order to avoid obscuring the invention. Although the followingdescription contains many specifics for the purposes of illustration,anyone skilled in the art will appreciate that many variations and/oralterations to suggested details are within the scope of the presentinvention. Similarly, although many of the features of the presentinvention are described in terms of each other, or in conjunction witheach other, one skilled in the art will appreciate that many of thesefeatures can be provided independently of other features. Accordingly,this description of the invention is set forth without any loss ofgenerality to, and without imposing limitations upon, the invention.

Broadly, embodiments of the present invention relate to methods andsystems for building and deploying machine learning systems for dataanalytics. Such machine learning systems may reside on one or morededicated servers, or on on-site client terminals such as desk PCs ormobile devices. More particularly, embodiments of the present inventionrelate to creating and deploying predictive engines in production, andsystematically evaluating and tuning predictive engine parameters tocompare different algorithms, engine variants or engines. In addition,embodiments of the present invention relate to tracking and replayingqueries, events, prediction results, and other necessary metrics fordeducing and determining factors that affect the performance of amachine learning system of interest. A replay loop may serve to provideoperators (developers and data scientists) insights into the selectionand tuning of data sources, algorithms, algorithm parameters, as well asother engine parameters that may affect the performance of a predictiveengine.

Generally, to create a smart application involving a machine learningsystem, a developer needs to first establish and train machine learningmodels or algorithms using training data collected from one or moresources. Such training data may also be simulated by historical datacollected internally or externally by the machine learning system. Asystem parameter may indicate how training data is prepared and sampledfor training predictive models. Next, training data are cleansed andunified into a consolidated format, and may be further randomly sampledor additionally processed, before being passed to and analyzed by themachine learning algorithms to determine system parameters that mayspecify which algorithms are to be evoked during deployment, and thecorresponding algorithmic parameters. The resulting algorithmicparameters provide a trained predictive model. Collectively, parametersfor a machine learning system control and specify data sources,algorithms, as well as other components within the system.

For example, to establish an algorithmic trading system, past prices andmarket trends may be analyzed to regress and extrapolate for futuretrading decisions. In this case, analysis of training data may determineregression coefficients for computing future trading prices or volumethresholds. Another example of a machine learning system is arecommendation engine for predicting products that users of ane-commerce website may potentially purchase. Such productrecommendations may be personalized, or filtered according to businessrules such as inventory conditions and logistical costs. Analysis oftraining data may determine brand names, price ranges, or productfeatures for selecting and ranking products for display to one or agroup of customers. In this example, system parameters may specify whichsources are to be employed as training data, what type of data cleansingis carried out, which algorithms are to be used, regressioncoefficients, and what business rules are to be applied to predictionresults.

Once a machine learning system is established, it can be deployed as aservice, for example, as a web service, to receive dynamic user queriesand to respond to such queries by generating and reporting predictionresults to the user. Alternatively, prediction results may be served indesired formats to other systems associated or not associated with theuser. As subsequent user actions or actual correct results can becollected and additional data may become available, a deployed machinelearning system may be updated with new training data, and may bere-configured according to dynamic queries and corresponding event data.In addition, predictive models may be configured to persist, thus becomereusable and maintainable.

In addition to creating and deploying machine learning systems, theinventors of the present invention have created methods and systems forevaluating and tuning machine learning systems in production. In thepresent invention, variants of predictive engines and algorithms areevaluated by an evaluator, using one or more metrics with test data.Test data include user queries, predicted results, and actual results orcorresponding subsequent user behaviors or sequences of user actionscaptured and reported to the evaluator. Test data, including actualresults, can also be simulated using data collected internally orexternally by the machine learning system. Evaluation results thusgenerated are used in automatic parameter set generation and selectionfor the machine learning system. Multiple instances of a predictiveengine, or engine variants, may be evaluated at the same time andsubsequently compared to determine a dynamic allocation of incomingtraffic to the machine learning system. Furthermore, the inventors ofthe present invention have created methods and systems for monitoringand replaying queries, predicted results, subsequence end-useractions/behaviors, or actual results, and internal tracking informationfor determining factors that affect the performance of the machinelearning system. For example, iterative replay of dynamic queries,corresponding predicted results, and subsequent actual user actions mayprovide to operators insights into the tuning of data sources,algorithms, algorithm parameters, as well as other system parametersthat may affect the performance of the machine learning system.Prediction performances may be evaluated in terms of prediction scoresand visualized through plots and diagrams. By segmenting availablereplay data, prediction performances of different engines or enginevariants may be compared and studied conditionally for further engineparameter optimization.

In addition, through an Application Programming Interface (API), thesemonitoring and replaying methods and systems may work for not onlyengines deployed on the machine learning system specified here, but alsoexternal engines and algorithms. In other words, implementations ofmonitoring and replaying of engine configuration and performances may beseparate from the engine deployment platform, thus allowing externalmonitoring and replaying services to be provided to existing predictiveengines and algorithms.

One feature of the present invention is its focus on engine parametersinstead of just algorithmic parameters. Engine parameters includehyperparameters such as data sources, algorithms employed, and businesslogic parameters in addition to configuration and data inputs toindividual algorithms. Such engine level considerations allow enginelevel comparisons. Instead of tuning algorithmic parameters alone,embodiments of the present invention allow additional selection of datasources, algorithms, business rules, and any other characteristic of theengine under consideration. Engine variants may be chosen by an operatoror a developer, based on a template with default values, or generatedautomatically. Multiple variants of an engine deployed according todifferent engine parameter sets can thus utilize different algorithms ordata sources, offering a much wider variety of deployable engineinstances for comparison and much more flexibility for performanceoptimization.

Another feature of the present invention is that it is capable oftracking multiple user actions, behaviors, or responses both immediatelyand over a delayed time frame. Sequences of user actions, such as mouseclicks followed by an online purchase, may be grouped and tracked underthe same tracking tag or replay tag associated with a particular query.In addition, user actions may be tracked across different sessions,cohorts, according to different segmentation rules.

With the ability to track and replay prediction history, embodiments ofthe present invention not only allow developers and data scientists totrack prediction accuracy, but also enable them to troubleshoot andreconfigure the system as needed. Instead of just returning predictionsuccess or failure rates for determining whether one variant performsbetter than another, embodiments of the present invention can replay thewhole prediction scenario, from engine parameters, queries, predictionresults, to actual results, user interactions, and evaluation metrics,to help developers understand particular behaviors of engine variants ofinterest, and to tailor and improve prediction engine design. Thegraphical or textual visual replay of evaluation and tuning results notonly makes the whole process easier to use, but also allows interactiveengine parameter tuning by an operator.

PredictionIO is a trademark name carrying embodiments of the presentinvention, and hence, the aforementioned trademark name may beinterchangeably used in the specification and drawings to refer to theproducts/services offered by embodiments of the present invention. Theterm PredictionIO may be used in this specification to describe theoverall machine learning system creation, evaluation, and tuningprocesses of the invention. The term “PredictionIO Enterprise Edition”is one version of the PredictionIO platform offered and sold toenterprise customers, with certain enhanced features above the baselineversion. Of course, the present invention is not limited to thetrademark name PredictionIO, and can be utilized by any namingconvention or trademark name whatsoever.

With reference to the figures, embodiments of the present invention arenow described in detail.

System Architecture

FIG. 1 shows a schematic diagram of a network configuration 100 forpracticing one embodiment of the present invention. A user-device ordevices 110 may be connected to a PredictionIO server or platform 130through network connection 120. For example, a user-device may be asmart phone 102, laptop 104, desktop PC 106, or tablet 108. Auser-device may also be wearable devices such as a watch, smart glasses,or an electronic tag. A user-device may be activated by user actions, orpre-installed programs. PredictionIO server 130 is a platform forcreating, deploying, evaluating, and tuning machine learning systems. Insome embodiments, PredictionIO server 130 is a predictive enginedeployment platform where predictive engines are machine learningsystems for generating predictions and decisions. In some embodiments,PredictionIO server 130 is a distributed system. For example, data storeand processing algorithms may be located on different devices; enginedeployment, monitoring, and evaluation may also be implementedseparately. In this embodiment, PredictionIO server 130 is connected toone or more user devices 110 through the wireless network or the wirednetwork 124. The wireless network comprises a cellular tower 122, or awireless router 126. The wired network 124 or the wireless network mayemploy technologies and protocols comprising Ethernet technology, LocalArea network (LAN), Wide Area Network (WAN), and optical network, andthe like. In another embodiment of the present invention (not shownhere), PredictionIO server 130 may be implemented directly in auser-device such as 102, 104, 106, or 108. Local installations of thePredictionIO service remove remote connectivity requirements in thenetwork configuration 100. Local installations of PredictionIO server130 may be subjected to additional software or hardware constraints.

FIG. 2A is a diagram showing an architectural overview 200 of adeployable machine learning framework based on a single predictiveengine, according to an exemplary embodiment of the present invention.In this embodiment, PredictionIO server 210 is composed of event server212 and a predictive engine 214. Event server 212 is a scalable datacollection and analytics layer. Event server 212 is responsible forimporting data 211, in real-time or in batch, from user application 220,which may be a mobile application, a website, an email campaign, ananalytical tool, or any other type of applications that may receive orcollect user input, action, or information. “User” refers to an entitythat interacts with PredictionIO Server 210 or predictive engine 214,and may or may not be a person. In one embodiment, event server 212 usesApache HBase as the data store. Event server 212 unifies data receivedfrom multiple channels or sources into unified data 213. For example,such multiple channels or sources may be one or more user applications,or different logical storage units on one or more user applications ordevices. In some embodiments, one data source may indicate what a useror customer has done on a mobile application, another data source mayindicate the customer's browsing history, yet another data source mayindicate user behaviors within a retail store. In this example, data 211may comprise user IDs, product IDs, product attributes, user preferencesand user ratings for particular products, as well as other user actions.Event server 212 unifies or aggregates data 211, possibly into apreferred format, under a user email address or user login ID if suchinformation is known. Alternatively, data 211 may be tagged with anentity ID such as a cookie ID for users or visitors who have not loggedinto the system. In production, new event data may be continuouslypushed into event server 212, which in turn integrates the new data withexisting datastore. When new data are integrated, predictive engine 214may be trained automatically or upon request, and the resulting newmodel may be exchanged with the existing model. In summary, event server212 serves two main purposes. It provides data to predictive engines formodel training and evaluation, and offers a unified view for dataanalysis. In addition, like a database server, an event server can hostmultiple applications.

In some embodiments of the present invention, event server 212 may be acomponent of predictive engine 214 instead of being an independententity. In addition, not all input data to predictive engine 214 must bestreamed from event server 212. In some embodiments, predictive engine214 may read data from another datastore instead of event server 212.

Based on unified data 213, predictive engine 214 can be created.Predictive algorithms can be selected to represent a given type ofprediction problem or task. Examples of prediction tasks includerecommendations and classifications. For instance, a similar itemrecommendation task may seek to predict items that are similar to thoseon a given list; a personalized recommendation task may seek to predictwhich items a given user or users are inclined or more likely to takeactions on; and a classification task may seek to predict whether agiven document of text body is a suggestion or a complaint. PredictionIOserver 210 may provide template predictive engines that can be modifiedby a developer for rapid development of system 200. Predictive engine214 may contain one or more machine learning algorithms. It readstraining data to build predictive models, and may be deployed as a webservice through a network configuration 100 as shown in FIG. 1 afterbeing trained. A deployed engine responds to prediction queries fromuser application 220, possibly in real-time, or over a given span oftime.

After data 211 are sent to event server 212, continuously or in a batchmode, predictive engine 214 can be trained and deployed as a webservice. User application 220 may then communicate with engine 214 bysending in a query 215, through an Application Programming Interface(API) or a REST interface; such interfaces may be automatically providedby PredictionIO platform 210. An exemplary query is a user ID. Inresponse, predictive engine 214 returns predicted result 218 in apre-defined format through a given interface. An exemplary predictedresult is a list of product IDs. In the classification examplepreviously discussed, query 215 may be a paragraph of text input, andpredicted result 218 may be an alphanumerical string that indicateswhether the input text is a suggestion or a complaint. In the similaritem recommendation task, query 215 may be a set of item IDs such as(P1, P2, P3), while predicted result 218 may be another set of item IDssuch as (P10, P11), indicating that products P10 and P11 are similar tothe given products P1, P2, and P3. Similarity among different items maybe defined through numerical scores and/or non-numerical criteria. Inthe personalized recommendation task, query 215 may be a user ID, whilepredicted result 218 may be a set of item IDs such as (P10, P11),indicating that the user with the given ID is more likely to takeactions on products P10 and P11.

FIG. 2B is a diagram showing an architectural overview of a deployablemachine learning framework 250 based on multiple predictive engines,according to one embodiment of the invention. Here each of mobileapplication 270, website 272, and email campaign 274 sends user input,behavior, and/or other related data 261 to event server 262,continuously or in a batch mode. Instead of a single predictive engine,different engines, shown as engines 264, 265, 266, and 267, may be builtfor different purposes within PredictionIO server or platform 260. Inthe product recommendation example, engine 264 may help a customerbrowsing an e-commerce web site discover new products of interest.Another engine 265 may be used for generating product recommendations orsales notifications in an email campaign. For instance, based on whatthe customer has browsed in the past few days, a similar or a relatedproduct may be presented in an email newsletter to the customer so thecustomer will return to the e-commerce website. In this particularexample, browsing history in the form of data 261 may be collectedthrough mobile application 270 and website 272 over a given span of timesuch as an hour, a day, or a week; query 263 may be generatedautomatically by an email client; and predicted result 268 may be servedin the form of texts or graphical elements through email campaign 274.

Similar to system 200 shown in FIG. 2A, each of mobile application 270,website 272, and email campaign 274 may communicate with engines 264,265, 266, 267 by sending in data 261 or query 263. A subset or all ofthe available predictive engines may be active, depending on data 261,or other engine parameter settings as configured through PredictionIOserver 260. In response, predictive engines return one or more predictedresults 268, individually or in combination, in a possibly pre-definedformat.

Even though only three user applications 270, 272, 274, and fourpredictive engines 264, 265, 266, 267 are shown in FIG. 2B, system 250may be scaled to include many more user applications, and PredictionIOserver 260 may be scaled to include fewer or many more predictivemodels. Additional user applications may each reside on the same orseparate devices or storage media. In addition, PredictionIO server 260may be scaled to include multiple predictive engines of different typeson the same platform. Event server 262 may function to provide inputdata to all predictive engines, or more than one event server may beimplemented within PredictionIO server 260. For example, depending onthe type of prediction required, subsets of data 261 may be storedseparately into multiple event servers and indexed correspondingly.

FIG. 3A is a diagram showing the components of a predictive engineinvolved in training predictive models within the predictive engine,according to one embodiment of the present invention. After user datahas been collected into event server 310, they can be pulled into datasource 322 of predictive engine 320. In addition to reading data from adatastore, data source 322 may further process data from event server310 according to particular settings of predictive engine 320. Datasource 322 then outputs training data 323 to data preparator 324, whichcleanses and possibly reformats training data 323 into prepared data325. Prepared data 325 are then passed to all algorithms 330 to 334,automatically or upon request. Predictive algorithms such as algorithms330 to 334 here are components of a predictive engine for generatingpredictions and decisions. A predictive engine may include one or morealgorithms, to be used independently or in combination. For example,separate algorithms may be employed to handle different types of userevent data, or a single algorithm may be implemented to take differenttypes of user event data into account. Each algorithm is configured toperform at least two functions, train and predict. One is for trainingthe corresponding predictive model, the other is for employing thepredictive model for generating a predicted result. During training,each algorithm returns a predictive model, which is in turn cached byPredictionIO server 300 such that models may persist and can be returnedonce recommendations need to be made. The models maybe in a distributedor a non-distributed object format, and PredictionIO server 300 mayprovide dedicated programming class interfaces for accessing such modelobjects.

To facilitate the creation and deployment of a predictive engine, aPredictionIO server such as 300 may provide programming templates forcreating each component of predictive engine 320. For example, a readfunction of data source 322 may be called directly to return trainingdata 323, and a prepare function of data preparator 324 may be called toprocess training data 323 into prepared data 325. Each of algorithms 330to 334 processes prepared data 325 to determine model or objectparameters.

FIG. 3B is a diagram showing the components of a predictive engineinvolved in responding to dynamic queries to the predictive engine,according to one embodiment of the present invention. After predictiveengine 320 has been trained, it can be deployed, as a web servicethrough network 120 as shown in FIG. 1, or as a local installation onclient devices. Once trained and deployed, predictive engine 320 mayrespond to dynamic query 362 from user application 360. Query 362 may bein a predefined format, and predictive engine 320 may conduct furtherconversion of query data 362 before passing it to one or more trainedalgorithms or models 330 to 334, to trigger a predict function withineach algorithm that has defined this particular function. As a result,each active algorithm or predictive model returns a predicted result inresponse to dynamic query 362. For example, the predicted result may bea list of product IDs, or a list of product recommendation scoresassociated with a list of product IDs. The predicted results are passedto a serving component 340 of predictive engine 320. Serving component340 further processes and aggregates the prediction results to generatea predicted result 345 for output back to user application 360. Analgorithm's predict function and Serving 340 may further include realtime business logics for filtering and processing prediction resultsfrom some of algorithms 330 to 334. For example, while in production, aproduct inventory may become depleted, thus a product recommendation forpurchase may need to be adjusted accordingly. In another example,serving 340 may take into account logistical costs to determine whetherproducts within a particular price range are more likely to beconsidered by a customer, thus should be recommended to the customerthrough user application 360. Alternatively, serving 340 may combineprediction results from a selected subset of algorithms. The returnedpredicted result 345 may be automatically structured into a programmingobject easily convertible to other formats by PredictionIO platform 300.

To facilitate evaluation and tuning of predictive engine 320, itsinputs, outputs, and internal parameters may be tagged and replayed.More detailed descriptions will be provided with reference to FIGS. 4 to8.

FIG. 4 is a diagram showing the overall structure of a predictive engine400, according to one embodiment of the present invention. Predictiveengine 400 may be separated into four major components, Data 420,Algorithm 430, Serving 440, and Evaluator 450, also known as a “DASE”architecture. The first three components Data 420, Algorithm 430, andServing 440 have been discussed with reference to FIGS. 3A and 3B. ThisDASE architecture provides a separation of concerns (SoC) that allowsdevelopers to exchange and replace individual components in predictiveengine design. In other words, the DASE architecture is aModel-View-Controller (MVC) for machine learning systems. All componentsof the DASE architecture are controlled by an engine factory object (notshown here) defined as part of a PredictionIO server.

The first Data component 420 refers to data source 422 and datapreparator 424. In FIG. 3A, data source 322 and data preparator 324receive data from event server 310. Similarly, in FIG. 4 here, datasource 422 imports application data 410, possibly from an event serverimplemented on a PredictionIO platform. Data source 422 functions as areader of internal or external datastores, while data preparator 424cleanses training data before passing prepared data to Algorithmcomponent 430 of predictive engine 400. Some exemplary functions of datapreparator 424 are to reformat and aggregate training data as desired,and to sample a subset of training data using a pre-defined randomsampling strategy. In some embodiments, data preparator 424 may beexcluded from Data component 420, and training data may be passeddirectly from data source 422 to Algorithm component 430 of predictiveengine 400. The inclusion or exclusion of data preparator 424 may beuseful in evaluating the performance of predictive engine 400 underdifferent settings or configurations.

The second Algorithm component 430 of predictive engine 400 comprisesone or more algorithms, denoted as algorithms 432 to 436 in FIG. 4. Avery simple example of an algorithm within Algorithm component 430 is anon-personalized, trending algorithm that recommends products which aremost popular in the store at the moment. A more complicated example maybe a personalized algorithm that takes into account products aparticular customer has purchased in the past. A single predictiveengine 400 may contain multiple algorithms; each can be trained asdiscussed previously with reference to FIG. 3A, and activated or calledupon request as discussed previously with reference to FIG. 3B. However,not all algorithms have to be trained or called at the same time. Theselection of algorithms within Algorithm component 430 could depend onthe availability of training data, computing resources, or otherfactors. The selection of algorithms is specified by parameters ofpredictive engine 400. In addition, a subset of algorithms can beselected for best performance, as will be discussed with reference toFIG. 8. Furthermore, data from preparator 424 may be sampled separatelyfor each algorithm for best performance. In some embodiments, the outputof the training process includes a model part and a meta-data part. Thetrained models and meta-data are stored in a local file system, in HDFS,or another type of storage. Meta-data may include model versions, engineversions, application ID mappings, and evaluation results.

Predicted results such as 431, 433 and 435 from activated algorithms arepassed to Serving component 440. Serving component 440 can combine,filter, and further process prediction results according to real timebusiness rules to generate predicted result 445. Such business rules maybe updated periodically or upon request.

In addition, to evaluate the performance of the prediction process tocompare different algorithms, algorithm parameter settings, as well asdifferent engine variants, an Evaluator component 450 receives data fromServing component 440, and applies one or more metrics to computeevaluation result 455 as an output. An engine variant is a deployableinstance of a predictive engine, specified by an engine parameter set.The engine parameter set includes parameters that control each componentof a predictive engine. An evaluation metric may quantify predictionaccuracy with a numerical score. Evaluation metrics may be pre-definedwith default computation steps, or may be customizable by developers whoutilize the PredictionIO platform.

Although not explicitly shown in FIG. 4, Evaluator 450 may receiveactual results, including correct values, user actions, or actual userbehaviors from a datastore or a user application for computingevaluation metrics. An actual result refers to a correct predictionresult or an actual outcome of a prediction task. If a predicted resultis the same as an actual result, the predicted result can be consideredas an excellent prediction. Recall the exemplary queries andcorresponding predicted results discussed with reference to FIG. 2A. Inthe classification task, an actual result may be the string “complaint”,which is a correct classification of the text input. In the similar itemrecommendation task, an actual result may be product IDs (P10, P20),indicating that products P10 and P20 are similar to given items (P1, P2,P3), although the predictive engine suggests products P10 and P11. In apersonalized recommendation task, an actual user behavior may be productIDs (P10, P20), indicating that the user selected products P10 and P20for further viewing and purchase, after products P10 and P11 arerecommended by the predictive engine. Another example of actual resultsis in algorithmic trading, where an actual result may be the actualopening or closing price of a particular stock on the next day. Actualresults may be collected through user devices, read from storage, orsimulated.

Prediction result 445 and evaluation result 455 can be passed to othercomponents within a PredictionIO server. As discussed previously, aPredictionIO server is a predictive engine deployment platform thatenables developers to customize engine components, evaluate predictivemodels, and tune predictive engine parameters to improve performance ofprediction results. A PredictionIO server may also maintain adjustmenthistory in addition to prediction and evaluation results for developersto further customize and improve each component of an engine forspecific business needs.

In some embodiments of the present invention, Apache Spark can be usedto power the Data, Algorithm, Serving, and Evaluator components. ApacheSpark is a large-scale data processing engine. In this case, distributedalgorithms and single-machine algorithms may both be supported by thePredictionIO Server.

Engine Parameter Tuning

A predictive engine within a PredictionIO platform is governed by a setof engine parameters. Engine parameters determine which algorithms areused and what parameters are to be used for each algorithm chosen. Inaddition, engine parameters dedicate the control of the Data component,Algorithm component, and Serving component of a predictive engine. Inother words, engine parameters include parameters for each componentcontroller. As engine parameters essentially teach how an engine is tofunction, engine parameters are hyperparameters. A given set of engineparameters specifies an engine variant.

The determination and tuning of engine parameters is the key togenerating good predictive engines. The evaluator component, also calledan evaluation module, facilitates the engine tuning process to obtainthe best parameter set. For example, in a classification applicationthat uses a Bayesian algorithm, an optimal smoothing parameter formaking the model more adaptive to unseen data can be found by evaluatingthe prediction quality against a list of parameter values to find thebest value.

In some embodiments, to evaluate engine parameters, available data canbe split into two sets, a training set and a validation set. Thetraining set is used to train the engine, as discussed with reference toFIG. 3A, while the validation set is used to validate the engine byquerying the engine with the validation set data, as discussed withreference to FIG. 3B. Validation set data include actual results oractual user behaviors. One or more metrics can be defined to comparepredicted results returned from the engine with actual results among thevalidation data. The goal of engine parameter tuning is to determine anoptimal engine parameter set that maximizes evaluation metric scores.The higher the score, the better the engine parameter set. For example,a precision score may be used to measure the portion of correctpredictions among all data points. In some embodiments, training andvalidation data are simulated by the PredictionIO platform.

FIG. 5A is a use flow diagram 500 showing a method of automaticallytuning parameters of a predictive engine by evaluating a generated listof parameter sets, according to one embodiment of the present invention.Correspondingly, FIG. 5B is an exemplary flow diagram 550 showing adetailed implementation of the use flow 500 shown in FIG. 5A. In FIG.5A, given an engine type 502, a parameter generator generates a list ofengine parameter sets all at once at step 510. In some embodiments, alist of engine parameter sets can be generated from a base engineparameter set by adding or replacing controller parameters. In someembodiments, a list of engine parameter sets can be generated from abase engine parameter set by incrementally changing the value of oneparameter within the base parameter set. The base engine parameter setmay take on default values stored in a PredictionIO platform, may begenerated manually by an operator, or may be generated automatically. Insome embodiments, the base engine parameter set may be derived fromprevious engine parameter set tuning and evaluation steps not shown inFIG. 5A. The base engine parameter set may also be included in the newlygenerated engine parameter sets. In other words, one of the newlygenerated engine parameter sets may equal to the base engine parameterset.

The generated list of engine parameter sets 515 are evaluated one by oneat step 520 according to a chosen evaluation metric or multiple chosenmetrics, until timeout or until a maximum number of tests is reached. Inthis example shown in FIG. 5A, the n-th engine parameter set isrepresented as the tuple (xn, yn, zn, . . . ), where each element of theparameter set may take on different variable types. In some embodiments,a baseline engine variant is presented as an optional input 527 and isalso evaluated. Baseline engine variant 527 is of engine type 502, andmay take on default engine parameter values stored in a PredictionIOplatform, may be generated manually by an operator, or may be generatedautomatically. The parameter value, evaluation score, and computationtime of each of the engine parameter set and the baseline engine variantare reported at step 530 as output 535. Subsequently, a new predictiveengine variant is created at step 540 with the parameter set having thebest score. If a baseline engine variant is present, an engine variantis created only if the best score is better than the baseline enginevariant's score. The whole engine and its complete parameter set (entireDASE stack, see definitions section), or any sub-component and itsassociated parameters, may be tuned. This illustrative example shows thetuning of engine parameter sets. In other words, the Data source/datapreparator, Algorithm, Serving, and Evaluation components and theirparameters can all be tuned in this manner as presented herein.

FIG. 5B illustrates an exemplary implementation of the engine parametertuning process shown in FIG. 5A as a flow diagram 550. At step 555, alist of a given N number of parameter sets is generated to be evaluated.At step 560, an iteration index n is set to 1. Evaluation of the n-thparameter set is carried out at step 565, and the evaluation result isstored in addition to the n-th parameter set itself. If neither amaximum number of tests MAX_N nor timeout has been reached at step 570,the parameter generation and evaluation processes continue through step572, where the iteration index n is incremented. Otherwise, the presenceof a baseline engine variant is considered at step 575. Without abaseline engine variant, the parameter sets and corresponding evaluationresults are reported at step 580, and a new engine variant with aparameter set of the best score is created at step 585 before the tuningprocess terminates at step 590. If a baseline engine variant is present,the evaluation result for the baseline engine variant is evaluated atstep 576, reported at step 577, and compared to that of the best scoreout of the list of parameter sets at step 578. A new engine variant isthen created only if the best score is better. In addition to theprocess shown in flow chart 550, alternative implementations of the useflow 500 is also possible.

FIG. 6A is a use flow diagram 600 showing a method of automaticallytuning parameters of a predictive engine by evaluatingiteratively-generated parameter sets, according to one embodiment of thepresent invention. Correspondingly, FIG. 6B is an exemplary flow diagram650 showing a detailed implementation of the use flow 600 shown in FIG.6A. In FIG. 6A, given an engine type 602, a parameter generatorgenerates a first engine parameter set at step 610. The newly-generatedengine parameter set 615 is evaluated at step 620 according to one ormore pre-defined metrics, and the evaluation result 625 is returned tothe parameter generator, unless a maximum number of tests or time outhas been reached. The parameter generator then generates the next engineparameter set, based on evaluation results of some or all of theprevious engine parameter sets. In some embodiments, a baseline enginevariant is presented as an optional input 627 and is also evaluated.Baseline engine variant 627 is of engine type 602, and may take ondefault engine parameter values stored in a PredictionIO platform, maybe generated manually by an operator, or may be generated automatically.The parameter value, evaluation score and computation time of each ofthe parameter set and the baseline engine variant are reported at step630 as output 635, and an engine variant is created, or chosen, with theparameter set of the best score at step 640. If a baseline enginevariant is present, a new engine variant is created only if the bestscore is better than the baseline engine variant's score. Thisillustrative example shows the tuning of engine parameter sets. In otherwords, the Data source/data preparator, Algorithm, Serving, andEvaluation components and their parameters can all be tuned in thismanner as presented herein.

FIG. 6B illustrates an exemplary implementation of the engine parametertuning process shown in FIG. 6A as a flow diagram 650. At step 655, afirst set of engine parameters is generated, evaluated, and thecorresponding results are stored. The iteration index n is set to 2. Thefirst set of engine parameters may be generated from a base engineparameter set, where the base engine parameter set may take on storeddefault values, or may be derived from previous engine parameter settuning and evaluation steps not show here. The first set of engineparameters may equal to the base engine parameter set. At step 660, then-th engine parameter set is generated, based on evaluation results ofsome or all of the previous (n−1) engine parameter sets. Evaluation ofthe n-th parameter set is carried out at step 665, and the evaluationresult is stored in addition to the n-th engine parameter set itself,for later reporting. If neither a maximum number of tests MAX_N nortimeout has been reached at step 670, the parameter generation andevaluation processes continue through step 672. Otherwise, the presenceof an optional baseline engine variant is considered at step 675.Without a baseline engine variant, the parameter sets and correspondingevaluation results are reported at step 680, and a new engine variantwith a parameter set of the best score is created at step 685 before thetuning process terminates at step 690. If a baseline engine variant ispresent, it is evaluated at step 676, the evaluation result is reportedat step 677, and the evaluation result is compared to that of the bestscore out of the list of parameter sets at step 678. An engine variantis then created only if the best score is better. In addition to theprocess shown in flow chart 650, alternative implementations of the useflow 600 is also possible.

In some embodiments, a PredictionIO platform may deploy a variant of agiven predictive engine with an initial set of engine parameters or aninitial engine parameter setting. The initial engine parameter set maytake on default values stored in memory, may be generated manually by anoperator, or may be determined automatically. The deployed enginevariant then receives queries, responds with predicted results, andreceives back actual results. Evaluation results are then generated andthe current engine parameter set and evaluation results are passed to anengine parameter generator. From time to time, the engine parametergenerator generates a new parameter set based on evaluation results ofthe current variant, and sometimes, evaluation results of previouslydeployed variants. Such previously deployed variants may have beenreplaced by previously generated new engine parameter sets, andevaluation results of previously deployed variants may have been storedby the PredictionIO platform. The new engine parameter set generated inthe current round may then be deployed to replace the existing enginevariant. Replacing old engine variants is an optional feature, as oldengine variants may also remain in memory for future analysis andcomparison, if desired or necessary.

FIG. 7A is a use flow diagram 700 showing a method of automaticallytuning parameters of a predictive engine by evaluatingiteratively-generated lists of parameter sets, according to oneembodiment of the present invention. Correspondingly, FIG. 7B is anexemplary flow diagram 750 showing a detailed implementation of the useflow 700 shown in FIG. 7A. In FIG. 7A, given an engine type 702, aparameter generator generates a first list, or batch, of engineparameter sets at step 710. The current list of engine parameter sets715 is then evaluated according to one or more pre-defined metrics atstep 720, and the evaluation results 725 are returned to the parametergenerator, unless a maximum number of tests or time out has beenreached. The parameter generator then generates the next list of engineparameter sets, based on evaluation results of the previous list ofengine parameter sets. In this example shown in FIG. 7A, the n-th listof engine parameter sets is represented as tuples {(xn1, yn1, zn1, . . .), (xn2, yn2, zn2, . . . ), . . . }, where each element of a parameterset may take on textual or numerical values. In some embodiments, abaseline engine variant is presented as optional input 727 and is alsoevaluated. Baseline engine variant 727 is of engine type 702, and maytake on default engine parameter values stored in a PredictionIOplatform, may be generated manually by an operator, or may be generatedautomatically. The parameter values, evaluation scores and computationtimes of each of the generated engine parameter sets and the baselineengine variant are reported at step 730 as output 735, and a new enginevariant is created with the parameter set of the best score at step 740.If a baseline engine variant is present, a new engine variant is createdonly if the best score is better than the baseline engine variant'sscore. This illustrative example shows the tuning of engine parametersets. In other words, the Data source/data preparator, Algorithm,Serving, and Evaluation components and their parameters can all be tunedin this manner as presented herein.

FIG. 7B illustrates an exemplary implementation of the engine parametertuning process shown in FIG. 7A as a flow diagram 750. At step 755, afirst list of engine parameter sets is generated, evaluated, and thecorresponding results are stored. The iteration index n is set to 2. Thefirst or initial list of engine parameter set may be generated from abase engine parameter set, or a base list of engine parameter sets,where the base engine parameter set or base list of engine parametersets may take on stored default values, or may be derived from previousengine parameter set tuning and evaluation steps not show here. Thefirst list of engine parameter sets may include the base engineparameter set or the base list of engine parameter sets. At step 760,the n-th list of engine parameter sets is evaluated, based on evaluationresults of the (n−1)-th list of engine parameter sets. Alternatively,the n-th list of engine parameter sets may be evaluated based onevaluation results of all (n−1) previous lists of engine parameter sets.Evaluation of the n-th list of parameter sets is carried out at step765, and the evaluation results are stored in addition to the n-th listof engine parameter sets itself, for later reporting. If neither amaximum number of tests MAX_N nor timeout has been reached at step 770,the parameter generation and evaluation processes continue through step772. Otherwise, the presence of an optional baseline engine variant isconsidered at step 775. Without a baseline engine variant, the parametersets and corresponding evaluation results are reported at step 780, anda new engine variant with a parameter set of the best score is createdat step 785 before the tuning process terminates at step 790. If abaseline engine variant is present, it is evaluated at step 776, theevaluation result for the baseline engine variant is reported at step777, and compared to that of the best score out of the list of parametersets at step 778. A new engine variant is then created only if the bestscore is better that the score of the baseline engine variant. Inaddition to the process shown in flow chart 750, alternativeimplementations of the use flow 700 is also possible.

Prediction History Tracking

In addition to evaluating the performance of predictive engines andtuning engine parameter sets, a PredictionIO platform may record actualresults, including subsequent user actions, actual correct results, oractual information of the previously unknown event now revealed, after aprediction has been made. Thus, prediction history can be tracked forupdating predictive engines during deployment. Such prediction historytracking may be performed in real-time, with live evaluation resultsreturned as feedback to predictive engines for further engine parametertuning and prediction accuracy improvement. Prediction history may alsobe individually or collectively replayed to operators of predictiveengines for troubleshooting purposes.

In some embodiments, a PredictionIO server generates and logs a uniquetracking tag for each user query. Correspondingly, predicted resultsgenerated in response to the current query and parameters of the enginevariant deployed are associated with the same tracking tag. A trackingtag may be an alphanumerical string, such as “X” or “X1”, a tuple ofalphanumerical strings such as “(X, 1)”, or any other identifier capableof identifying individual queries. Recall that in some embodiments, aquery may include identifying information including user ID, product ID,time, and location. Similarly, a tracking tag may be in the form of(user-device ID, user ID, time stamp). Subsequent actual resultsincluding user actions and behaviors, and actual correct resultsrevealed after the prediction result has been served, are also loggedunder the same tracking tag. As a result, prediction results and actualresults can be segmented or categorized according to identifyinginformation such as product name, time, day of week, user categories,and/or attributes. User actions and/or behaviors may be monitored over along period of time such as several hours, days, or even months. Useractions or behaviors may also be logged as sequences instead of a set ofindividual events. For example, a user may click on five products beforepurchasing a particular product. All five user clicks and the purchasemay be viewed together as a sequence of user actions. User actions orbehaviors may also be further segmented according to connection sessionsor even browsing windows. For example, user actions performed on onewebpage may be recorded separately from user actions performed onanother webpage, or they can be combined under the same user ID.Collectively, such tracking data as identified by the possibly uniquetracking tag can be replayed to a developer of a predictive engineautomatically or upon request to assist in improving and understandingthe performance of predictive engines. Tracking tags are thus alsocalled replay tags. As previously discussed, a “user” refers to anyentity that interacts with a PredictionIO Server or predictive engines,and may or may not be a person.

More specifically, a PredictionIO server may include a replay loop toperform live evaluation of predictive engines with great details andhigh levels of accuracy. In some embodiments, a PredictionIO serverprovides a special data source (data reader) or event datastore that canuse the tracking data to replay how a prediction engine performs. Thisdata source is able to reconstruct the complete history of each userthat queries the system. In addition to tracking tags specific toindividual queries, other types of data characteristics or meta-data canbe employed to group and sort tracking data. Such meta-data may or maynot be part of the tracking tags themselves. A replay loop may bedisplayed graphically or textually to a developer of the system or anoperator of the replay loop. Exemplary displays include event logs andgraphs, time-series plots, performance curves, charts, and so on. ThePrediction IO server may also provide a special evaluator component thattakes the complete history of each user and produce accurate anddetailed reports of how each prediction performed. Besides obtaining abetter picture of how the prediction engine performs in contrast toblack-box tests, this level of detail enables fine tuning andtroubleshooting of the prediction engine by data scientist and enginedevelopers.

FIG. 8 is an illustrative diagram 800 showing a PredictionIO platform805 in the process of evaluating and tuning two engine variants,according to one embodiment of the present invention. Other than userapplication 880, all components shown in FIG. 8 may be implemented aspart of a PredictionIO platform 805. A distributed implementation isalso possible.

In this embodiment, two variants of a predictive engine E are deployedthrough a PredictionIO platform. Each of the two variants receivesqueries from a user application and generates predicted results. Suchpredicted results are tagged with tracking or replay IDs, and aresubsequently evaluated, with their corresponding engine parameter setstuned to generate two new variants of the predictive engine E. An enginevariant is a deployable instance of a predictive engine specified by anengine parameter set. In FIG. 8, the first variant 820 of engine E 810is specified by engine parameter set 813, while the second variant 822of engine E 810 is specified by engine parameter set 814.

An exemplary value of the parameter set 813 is as follows:

Parameter Set 813 { DataSource: x2 AlgorithmList: Algorithm 4:AlgoParam1: b1 AlgoParam2: a2 Algorithm 2: AlgoParamY: 33 }

Parameter set 813 states that variant 820 uses DataSource x2, andAlgorithms 4 and 2. The values of algorithm parameter1 and algorithmparameter2 of Algorithm 4 are set to b1 and a2 respectively, while thevalue of the parameter Y of Algorithm 2 is set to 33.

Similarly, an exemplary value of the parameter set 814 is as follows:

Parameter Set 814 { DataSource: x1 AlgorithmList: Algorithm 1:AlgoParam1: a1 AlgoParam2: a2 Algorithm 2: AlgoParamZ: 23 }

Parameter set 814 states that variant 820 uses DataSource x1, andAlgorithms 1 and 2. The values of algorithm parameter1 and algorithmparameter2 of Algorithm 1 are set to a1 and a2, while the value of theparameter Z of Algorithm 2 is set to 23.

In various embodiments of the present invention, the evaluation andtuning processes may start at either deployment platform 812 or userapplication 880. For example, after deployment platform 812 deploysengine variant 820 and engine variant 822, user application 880 may sendthree queries Q1, Q2, and Q3 (882) to PredictionIO platform 805. In someembodiments, a query may include identifying information including userID, product ID, time, and location. A split test controller 860determines which deployed variant each query is transferred to. In someembodiments, a single query may be transferred to more than one deployedengine variants. In this example, queries Q1 and Q3 (821) are passed tofirst variant 820, while query Q2 (823) is passed to second variant 822.Deployed engine variant 820 then generates predicted results 824including predicted result P1 with replay ID X, and predicted result P3with replay ID Z. Replay IDs in this example are alphanumeric trackingtags specific to individual queries. Similarly, deployed engine variant822 generates predicted results 825 including predicted result P2 withreplay ID Y. Predicted results 824 and 825 are then passed back to splittest controller 860, to be exported as output 886 to user application880. In embodiments where more than one user applications are present,the split test controller may track which user application a particularquery has been generated from, and corresponding predicted resultsshould be transferred to. In some embodiments, predicted results may beserved to user applications other than the one where queries have beengenerated.

In addition to passing predicted results to the split test controller,each deployed engine variant 820 and 822 also passes data 815 and 884 todatastore 830 in this example shown in FIG. 8. Data 815 include two setsof tracking data, one specified by replay ID X and one specified byreplay ID Z. The first set of tracking data specified by replay ID Xincludes query Q1, predicted result P1, and a description of enginevariant V1. This description of engine variant V1 may be engineparameter set 813 itself, or some meta-data that uniquely identifiesengine parameter set 813 to event datastore 830. Similarly, the secondset of tracking data specified by replay ID Z includes query Q3,predicted result P3, and a description of engine variant V1. Data 884include a single set of tracking data specified by replay ID Y, and arecomprised of query Q2, predicted result P2, and a description of enginevariant V2.

In this embodiment, at user application 880, user actions and/orbehaviors collected subsequent to receiving predicted results P1, P2,and P3 (886) from PredictionIO platform 805 are considered as actualresults A1, A2, and A3 (884) respectively, and tagged with correspondingReplay IDs. Such user actions may be collected in real-time, or over agiven time span such as a few hours, a day, or a week. Recall that eachquery evokes a prediction process to generate a predicted result, andeach query is uniquely identified by a replay ID. Hence, multiple useractions or actual results corresponding to a particular query with agiven replay ID may be tagged with the same replay ID. For example,actual result A1 shown in FIG. 8 may represent a sequence of user clicksand browsed product pages, all corresponding to query Q1, productrecommendation P1 and replay ID X.

After actual results 884 are transferred to datastore 830, enginevariant parameter sets, queries, predicted results, and actual resultscorresponding to the same Replay ID are aggregated within datastore 830,using the data source (data reader) or event datastore mentioned above.Aggregated data sets 832 are sent to evaluator 840 for evaluation. Inthis embodiment, two metrics 842 and 844 are used within evaluator 840,individually or in combination. Evaluation results are sent to autoparameter tuning variant generator 850. Auto parameter tuning variantgenerator 850 functions in cooperation with evaluator 840 according toone of the processes discussed with reference to FIGS. 5A to 7B, beforeoutputting updated engine parameter sets 852 that specify two newvariants V3 and V4 for Engine E. The newly generated engine variants maybe subsequently deployed by deployment platform 812. The cycle ofprediction, evaluation, and auto parameter tuning continues as more userqueries are imported into the system.

In some embodiments, engine variant V3 is generated based on enginevariant V1 alone, and engine variant V4 is generated based on enginevariant V2 alone. In some embodiments, both engine variants V3 and V4are generated based on both engine variants V1 and V2. For example, aspart of evaluator 840 or auto parameter tuning variant generator 850,variants V1 and V2 of engine E 810 may be compared according to computedmetrics 842 and 844. Such pair-wise comparison may provide abetter-performing engine variant, the engine parameter set of which mayin turn serve as a base parameter set for generating new variants V3 andV4. In another example, more than two variants may be deployed andevaluated at the same time. Evaluator 840 may sort or rank theperformances of such multiple engine variants, with pair-wise ormultiple-way comparisons, before generating new engine variants forfurther deployment and evaluation.

In some embodiments, one or more new engine variants may be determinedmanually by an operator. For example, the operator may examineevaluation results output by evaluator 840, and manually input a new setof engine parameters as new engine variant V3. In another example, theoperator may directly modify the output of auto parameter tuning variantgenerator 850.

In addition to auto parameter tuning, a developer of the predictiveengine E or an operator of the replay loop as shown in FIG. 8 may preferto examine prediction history to tune engine parameter sets directly andto troubleshoot issues in predictive engine design. For example,Prediction platform 805 may include an interface or a hook to such aninterface for users or operators to provide actual results directly.PredictionIO platform 805 may also allow operators to tag debugginginformation, so each prediction will have debugging information that canbe examined using a Replay feature as will be discussed next. Visualreplay 890 may replay tracking data from data store 830 and availabledebugging information to operators, thus providing insights into theselection and tuning of data sources, algorithms, algorithm parameters,as well as other engine parameters that may affect the performance of apredictive engine. Such extensive replay of prediction history allowsoperators to understand and deduce why particular prediction results aregenerated and how prediction performances can be improved.

Replay Examples

The present invention allows users to replay prediction scenarios toanalyze, visualize and detect the change of prediction accuracy overvarious segmentations, such as time. Take the following three types ofprediction problems as examples, shown in Table 1.

TABLE 1 Replay Examples Predicted Actual Result (or Query Result useractual action) 1 Text Suggestion Complaint 2 <P1, P2, P3> <P10, P11><P10, P20> 3 <user id> <P10, P11> <P10, P20>The examples shown in Table 1 correspond to:

-   -   1. Classification. Given a document of text body, predict        whether it is a suggestion or a complaint.    -   2. Similar item recommendation. Given a list of items, predict        which other ones are similar to them.    -   3. Personalized recommendation. Given a user id, predict which        items the user will incline to take actions on.

The Replay process may further allow operators to visualize thepredicted results with actual results during the evaluation phase.

Replay for Performance Analysis and Monitoring

As prediction history and tracking data are collected and stored,prediction scenarios may be replayed and the complete prediction historyof each user that queries the system may be reconstructed, allowingoperators of the replay process to analyze, visualize, and detectchanges of prediction accuracy over various segmentations, such asdifferent time periods. Recall from the discussion of evaluator 450 inFIG. 4 that actual results such as actual user behaviors may be receivedfrom a datastore or a user application during the evaluation phase. Suchactual results may be visualized with predicted results through visualreplay 890 for comparative purposes. Given a particular replay ID,visual replay 890 may retrieve and selectively display associated query,predicted result, actual result, additional auxiliary user informationor meta-data, and possibly the corresponding engine variant as given bythe engine parameter set. In some embodiments, a selected subset oftracking data may be visually displayed, where the subset is pre-definedor manually configured by an operator of visual replay 890. Patterns,anomalies, and trends in tracking data may thus be analyzed by thesystem or by the operator directly. A replay of prediction history orengine performance may or may not be followed by further engineparameter tuning processes.

As the cycle of prediction, evaluation, and auto parameter tuning takesplace, visual replay 890 may function as a task monitor, allowing theoperator to selectively and incrementally view tracking data thuscollected. In some embodiments, operators can be notified when userconversion (decision to purchase) drops below a certain predefinedthreshold for a particular engine or engine variant. The operator canthen utilize the replay feature of the PredictionIO platform fortroubleshooting and continuous prediction performance monitoring.

FIG. 9 is an exemplary graph 900 of actual results, in this case, actualuser actions recorded over a given time period, according to oneembodiment of the present invention. In this particular visualizationexample, user actions are plotted between a starting time 920 at 11:30pm on a given Monday and an end time 925 at 11:30 pm on the givenMonday. Each data point on the plot represents a particular user actionor event that occurred after a target prediction has been made inresponse to a user query with a replay ID. Tracking data 950 aredisplayed on the graph to show that the plotted actual user actions aretaken by user 435, after an engine variant 1 has been employed to makepredictions. Alternatively, a replay ID or the engine parameter set maybe displayed. In some embodiments, the replay ID may comprise thedisplayed user ID, engine variant, and a given time span. In otherwords, visual replay of tracking data may be based on user segments. Inthis particular example, tracking data 960 are displayed next to a datapoint to indicate that a click event has been detected and assigned anitem ID of 34324.

In this example, actual user actions over a five-minute time period ofsegmentation are plotted. In some embodiments, actual results or othertypes of tracking data may be plotted over shorter or longer timesegmentations. In some embodiments, tracking data associated withmultiple users, multiple queries, or multiple replay IDs are plotted onthe same graph. Moreover, data may be grouped by cohort, session, andother types of data characteristics. The PredictionIO platform mayautomatically detect patterns in tracking data, and cluster themaccordingly. On the other hand, operators may specify desired groupingsdirectly. For example, operators can select a specific user and session,to see all the events associated with the user or session.

In addition to displaying tracking data directly, the PredictionIOplatform may produce detailed reports on prediction histories, enablingthe further fine tuning of prediction engines.

FIGS. 10 and 11 are two illustrative plots showing how reports ofprediction results may be viewed graphically, according to illustrativeembodiments of the present invention. FIG. 10 shows the number ofprediction successes and failures over a four-day time-span. In thisexample, the horizontal time axis 1010 is divided into individual days,while the vertical axis 1020 represents the number of occurrences. Thepiecewise-linear success and failure curves may refer to a particularengine variant, or all variants of a particular predictive engine. Insome embodiments, vertical axis 1020 may be set in a percentage scale ora log scale. In addition to graphical representations, this report ofprediction results may alternatively be generated as a table.

An operator of the replay process may further zoom in and out of acertain time period such as a single day, as indicated by lines 1030 and1035, to examine additional details and to further troubleshoot issuesin predictive engine design and engine parameter tuning. Although onlyfour data points are shown for each time-series data curve in FIG. 10,in some embodiments, number of prediction successes and failures may bestatistically summarized over strategically generated samples andtime-spans. The PredictionIO platform may provide default values for thetime scale. In some embodiments, the PredictionIO platform may take intoaccount the amount of data available to dynamically determine optimaltime scale values for binning purposes. In yet some other embodiments,the PredictionIO platform may further generate and display linear ornon-linear regression curves to model the observed tracking data. The“Success” and “Failure” metrics shown here are two examples ofstatistics useful for analyzing prediction performances. Operators maydefine additional metrics such as success rates and confidencestatistics, and more than two metrics may be provided in a report, andshown graphically in a visualization.

As previously discussed, data may be grouped by cohort, session, andother types of data characteristics in generating useful statistics foranalyzing prediction results. FIG. 11 is a bar chart of predictionsuccesses and failures plotted against different genders. By consideringdifferent genders separately, it becomes clear that the current engineor engine variant under consideration is more tailored for male usersinstead of female users. Consequently, an operator or developer maydecide to include gender as an additional variable in the predictivemodel. In some embodiments, other types of charts such as histograms andscatter plots may be displayed.

Data Augmentation

In FIG. 11, success and failure metrics are plotted against differentgenders. In some embodiments, the PredictionIO platform provides a dataaugmentation feature for augmenting available user data with additionalinformation such as gender. For example, external information to beaugmented may include ZIP code, age group, ethnicity, occupation, andfamily size. Additional information to be augmented may also be minedfrom behavior data. For example, users may be classified intohigh-spending and low-spending groups, or frequent on-line shopping ornon-frequent on-line shopping groups. Data augmentation provides newways of categorizing tracking data for better performance monitoring andanalysis.

Support for Multiple Experiments

Recall from the discussion with reference to FIG. 8, that multipleengine variants may be tested and studied at the same time, with a splittest controller determining which engine variant a user query isdispatched to. Similarly, FIG. 12 shows a system 1200 for testingmultiple engine variants at the same time, according to an illustrativeembodiment of the present invention.

In system 1200, input user traffic 1210 may be allocated dynamicallythrough forward 1220, based on the performance of each engine variantunder consideration. For example, initially, half of new user traffic orqueries 1210 may be directed to the predictive engine 1240, while theremaining half are simply stored and thus not directed to a predictiveengine, as indicated by the No Engine placeholder 1230. In someembodiments, forward 1220 is a split test controller similar tocomponent 860 shown in FIG. 8. Predictive traffic through predictiveengine 1240 may be equally shared among its three variants 1242, 1244,and 1246. Thus each engine variant takes on one-sixth of the overalluser traffic. Overtime, it may be determined that a specific variantsuch as engine variant 1242 provides higher prediction accuracy. As aresult, forward 1220 may automatically direct more than one-sixth ofoverall traffic to engine variant 1242 to optimize overall systemperformance. The PredictionIO platform seeks to strike a balance betweenexploration and exploitation. In yet some other embodiments, forward1220 may direct the same predictive traffic to multiple engine variants,thus enabling direct comparison of prediction results and predictionaccuracy across the multiple engine variants.

In some embodiments, a PredictionIO platform may deploy multiple enginevariants with initial sets of engine parameters or initial engineparameter settings. The deployed engine variants then receive queries,as allocated by a splitter, and respond with predicted results.Corresponding actual results are also received. Evaluation results arethen generated and the current engine parameter sets and evaluationresults are passed to an engine parameter generator. From time to time,the engine parameter generator generates one or more new parameter setsbased on evaluation results of the current variants, and sometimes,evaluation results of some or all previously deployed variants. Suchpreviously deployed variants may have been replaced by previouslygenerated new engine parameter sets, and evaluation results ofpreviously deployed variants may have been stored by the PredictionIOplatform. The one or more new engine parameter sets generated in thecurrent round may then be deployed to replace the existing enginevariants.

In yet other embodiments, a PredictionIO platform may performevaluation, tuning, and/or comparison of multiple engines. For example,multiple engines may be implemented by different developers and datascientists for a particular prediction problem such as classification ofincoming mail as spam or non-spam, or recommendation of similar items. APredictionIO platform may provide, to externally or internallyimplemented predictive engines, engine evaluation, engine parameter settuning, prediction history tracking, and replay services as discussedthroughout the current disclosure. For multiple engines targeting thesame prediction problem, the PredictionIO platform may serve as aninterface for cross-comparison and engine selection. For multipleengines targeting different prediction problems based on queries fromthe same user, PredictionIO platform may serve as an interface forcross-examination, selection, and aggregation.

Visual Replay

In addition to illustrative plots shown in FIGS. 9, 10, and 11, FIGS.13-18 provide illustrative visual displays of prediction performancesover one or more replay groups. A replay group refers to a pre-definedor operator-defined segment of queries that satisfy one or moreconditions as provided through query segment filters. Replay groups maybe created for textual or visual displays. Examples of query segmentfilters include engine variant filters, user attribute filters, itemattribute filters, query attribute filters, and other property filtersor conditional filters capable of selecting a subset of availablequeries for performance analysis and monitoring. For example, an enginevariant filter may select queries that have been, or will be processedthrough a given engine variant, and a single query may be assigned tomultiple replay groups if it has been or will be processed throughmultiple engine variants; a user attribute filter may be applied ifqueries contain at least a user, and may be used to select queriesassociated with users in a particular age group; an item attributefilter may be applied if queries contain at least an item; and an querytime attribute filter may be applied if queries have associatedtimestamps. Multiple query segment filters may be used jointly, andfiltered results may be combined as intersections or unions of querysegments. Query segment filters may be pre-defined or operator-defined,and may be applied automatically or upon request by an operator. Inaddition, since query segment filters select subsets of queries withoutnecessarily affecting the prediction process, they may be applied duringany stage of the predictive engine tuning, evaluation, and replayprocess. In one example, a query segment filter may be applied to aquery as the query is received from an end-user device, before theprediction process takes place. In another example, a query segmentfilter may be applied to stored queries or query records afterpredictions have been made already. Each query may be associated withone or more replay group IDs as query segment filters are applied.

As a more specific example, a recommendation engine may be deployed asan Engine Variant e_v_100, with an initial or default engine parameterset. A query to ask this engine to recommend five products to a user 123when the user is in San Francisco may look like [userid=123, city=SF,num=5]. Since userid refers to a user, a filter of a new replay groupfor Engine Variant e_v_100 may have user attribute options. Userattributes can be anything that the system has stored about users. Forinstance, age, gender, sign up date, plan or service a user hassigned-up for, range of user ids, dates, and so on. If the systemcontains users' behavior data, the filter can even go further to selectqueries that have targeted users who have performed certain actionsduring a certain time range. For example, one or more filters may beapplied to generate a replay group by selecting queries for recommendingfive products to female users when they are in San Francisco.

FIG. 13 shows an illustrative visual display of prediction performancesof a predictive engine over a replay group, according to one embodimentof the present invention. In this example, performance of the predictionprocess evoked in response to a given query is quantified, orquantitatively represented, by a prediction score. A prediction scoremay be calculated by at least one pre-defined or operator-defined scorefunction based on the predicted result(s) and actual result(s)associated with the query. Generally, the deployed engine variant,derived predicted results, actual results, and corresponding computedprediction scores are all associated with the replay ID specific to thegiven query. In some embodiments, the prediction score is computed byevoking a score function using a score_function(PredictedResult,ActualResult) command. A score function may also take on additionalinputs that further configure the score computation process. Differentscore functions may be provided by a PredictionIO platform. In someembodiments, an operator may define multiple score functions and eachreplay group may have more than one set of prediction scores.

Depending on how such score functions are defined, computed predictionscores may take on both positive and negative values in someembodiments, but be non-negative in some other embodiments. Computedprediction scores may also be normalized, and may take on continuous ordiscrete values. For example, consider an input predicted resultcontaining two items, such as (P10, P11), and an input actual resultalso containing two items. In some embodiments, a score function mayreturn a value of 1 if the input actual result is exactly the same,i.e., (P10, P11), and 0 otherwise. In some embodiments, a score functionmay return a score of 0, 1, or 2, depending on the number of overlappingitems from the predicted result and the actual result. Such a score mayalso be normalized to 0, 0.5, or 1, representing the percentage ofcorrectly predicted items.

In this and subsequent illustrative examples shown in FIGS. 13 to 18,prediction performances are plotted in terms of accumulated predictionscores over time. Here an accumulated prediction score is calculated byan accumulation function that summarizes the prediction scores of allqueries of a replay group within defined time intervals over a giventime period. For example, each query may have an associated timestamp,representing the time at which the query was received by the predictiveengine. According to such timestamps, queries within a replay group maybe segmented for computing accumulated prediction scores. In anotherexample, a timestamp may represent when a prediction has been made, or asign-up date/time at which a user has signed-up for prediction service.Generally, computation of accumulated predicted scores may be carriedout over any categorization or segmentation of queries within a replaygroup. Furthermore, when multiple score functions are defined, multipleaccumulated scores may be displayed on the same visualization chart oron separate charts.

FIG. 13 shows an illustrative visual chart 1300 of prediction scoresaccumulated over two-day intervals during the month of January, 2015 fora Replay Group 1. Data points have been connected to generate apiecewise-linear curve 1350. The horizontal axis 1310 with label 1315shows the time period of interest, between Jan. 1, 2015 inclusive, andJan. 31, 2015 exclusive. In some embodiments, this time period ofinterest may cover one or more specific dates, consecutive ornon-consecutive, or a range of dates. The vertical axis 1320 with label1325 refers to accumulated prediction scores. Recall that each query mayhave a timestamp indicating the time and/or date at which the query hasbeen received or when a prediction has been made by a PredictionIOPlatform in response to the query. Although not shown explicitly here,Replay Group 1 may have been obtained through a query segment filterthat selects all queries with timestamps within January, 2015. In FIG.13, data point 1340 is the prediction score accumulated over all querieswith a timestamp between time 1342 (Jan. 19, 2015) inclusive, and time1344 (Jan. 21, 2015) exclusive. Time intervals such as the one betweentime 1342 and time 1344 represent how the system groups queries togetherover the whole time period of January, 2015. In a similar example,queries may be grouped into one-day intervals over a four-day period,and the prediction score may be defined to take on the value of 1 or 0depending on whether an input prediction result is the same as an inputactual result. The resulting plot of accumulated scores would then besimilar to the success curve shown in FIG. 10.

An operator of the replay process may zoom in and out of the time periodshown in FIG. 13, to examine additional details in the predictionperformance visualization, thus further troubleshoot issues inpredictive engine design. For example, although prediction scores areaccumulated over two-day intervals during a single month in FIG. 13, insome embodiments, the system may allow an operator to manually configurethe time interval(s) and time period for plotting. The PredictionIOplatform may also take into account the amount of data available todynamically determine optimal time intervals for prediction scoreaccumulation and visualization.

In some other embodiments, Replay Group 1 may be generated by selectingqueries containing users who have signed up for prediction serviceduring January, 2015. Generally, the time period 1315 may refer to anytime-related query attribute. In other embodiments, prediction scoresmay be accumulated over different categories such as user gender,leading to accumulated score plots similar to the diagram shown in FIG.11. Moreover, although accumulation has referred to a directsummarization operation in generating the plot shown in FIG. 13, in someembodiments, accumulation may refer to other algebraic or statisticaloperations such as averaging, weighed summation, and such. A directsummation operation is a weighed summation with weights equal to 1. Anaveraging operation is a weighed summation with weights equal to thereciprocal of the number of queries. A statistical sampling processfollowed by direct summation may be considered as a weighed summationwith weights equal to 1 or 0. Non-linear weighing is also possible insome embodiments of the present invention.

In FIG. 13, only a single replay group has been visualized as curve 1350and labeled by legend 1330. FIG. 14 shows an illustrative visual display1400 of prediction performances over two replay groups, according to oneembodiment of the present invention. In addition to Replay Group 1 asrepresented by the curve 1450, accumulated scores for queries withinReplay group 2 is visualized as curve 1460. Both replay groups arelabeled by legend 1430. In addition, visual display 1400 includes threecheckboxes 1442, 1444, and 1472, placed below the plotting window.Checking and un-checking boxes 1442 and 1444 turn the display of curves1450 and 1460 on and off respectively. Box 1472 provides a “WholePeriod” option, which sets the time interval for prediction scoreaccumulation to the entire time period of interest. Checking box 1472turns each of curves 1450 and 1460 into a single data point. In otherwords, under the whole period option, all queries within the time periodof the chart would be summarized to generate a single accumulatedprediction score.

FIG. 15 shows an illustrative visual display 1500 of predictionperformances over a replay group created using query segment filters,according to one embodiment of the present invention. In thisembodiment, the visual display 1500 is divided into two windows,plotting window 1505 for visualizing accumulated prediction scores, andinteractive display 1560 that allows an operator to create Replay Group1 dynamically for generating curve 1550. Upon initialization, fields ininteractive display 1560 may take on default values, which may bepre-defined or may be automatically calculated by the system. Given adeployed engine variant, box or field 1564 allows an operator to assigna name to the engine variant for easy identification. Labels 1572, 1582,and 1592 indicate user attributes that can be set by the operator. Suchuser attributes may be pre-defined or operator-defined. In addition, thePredictionIO platform may assess all available queries to determine ifusers are present, and if so, which user attributes are present and canbe selected for generating replay groups. In this particular example,age, sign-up date, and gender are three available user attributes.Checkboxes 1574 allow the operator to determine a user age group. Inthis example, accumulated scores are generated based on users in thebelow-30 age group, as indicated by value 30 in field 1575. Boxes 1584and 1587 allow the operator to select users who have signed-up during aparticular time period, for example, after Jun. 1, 2014, but before Aug.1, 2014. Pull-down menus may be activated through buttons 1585 and 1588to select dates from a calendar. In addition, checkboxes 1593 allow theoperator to select both male and female users.

Once user attributes have been input by the operator, Replay Group 1 maybe updated automatically, and accumulated prediction scores may bevisualized in plotting window 1505. Alternatively, a request forupdating the replay group and the corresponding accumulated predictionscore visualization may be received by the system when the operatorclicks on the “Plot” button 1599.

In some embodiments, operators can create as many replay groups on avisual chart as they like. Each replay group may be created throughinterfaces similar to interactive display 1560, or may be loaded fromstorage. Operators can assign a name label to each replay group for easyidentification, and can use different colors or symbols for each replaygroup.

In some embodiments, accumulated prediction scores of one or more replaygroups within the time period of interest can be displayed on the visualchart through different graphical representations such as line plots,histograms, bar charts, and scatter plots. For example, FIG. 16 shows anillustrative histogram 1600 representing prediction performances overtwo replay groups, according to one embodiment of the present invention.The same Replay Groups 1 and 2 from FIG. 14 are shown here. Each bar,such as bars 1640 and 1650, corresponds to prediction scores accumulatedover one-week intervals during the one-month period of January, 2015.

Although not shown explicitly in FIGS. 13-16, in some embodiments, anoperator may manually adjust the values of the time period and timeinterval, as well as definitions for the score function and accumulationfunction. The visual chart may be updated automatically once thesevalues are changed, or upon request when such requests are received fromthe operator.

In addition, FIGS. 17 and 18 show illustrative visual displays ofprediction performances over multiple replay groups, according toembodiments of the present invention. In FIG. 17, visualization 1700shows how well one engine variant performs over a given one-month periodfor three different user segments divided by age groups. Curves 1750,1760, and 1770 correspond to Replay Groups 1, 2, and 3 respectively, asindicated by legend 1730. Queries are divided into below-30, 30-to-60,and above-60 age groups, and queries within each replay group areprocessed through engine variant e_v_111. In some embodiments, ReplayGroups 1, 2, and 3 are generated by applying a user attribute filterthat examines the user age attribute. All queries within each replaygroup are processed through engine variant e_v_111, either before orafter the user attribute filter is applied.

In FIG. 18, visualization 1800 compares how three engine variantsperform over a given one-month period for the below-30 age group. Curves1850, 1860, and 1870 correspond to Replay groups 1, 2, and 3respectively, as indicated by legend 1830. In some embodiments, ReplayGroups 1, 2, and 3 are obtained by applying a user attribute filter aswell as an engine variant filter. Once a query is processed by an enginevariant to generate a corresponding predicted result, the query mayinclude the engine variant information as part of the resulting queryrecord. A query record may include the input query, engine variantinformation, predicted results, actual results, prediction score, and/orany other information relevant to the input query and how the inputquery has been processed by the prediction system. Thus, a single inputquery to a predictive engine may lead to multiple query records; andquery records corresponding to the same input query may be segmentedinto different replay groups. An input query may also be associated withmultiple replay group IDs, depending on how it is processed by theprediction system.

Detailed Prediction Debugging

Once visual replay of prediction performances are generated, an operatorof the replay process may further zoom in and out, or mouse-over thevisualization to examine additional details in the prediction process,hence further troubleshoot issues in predictive engine design. ThePredictionIO platform thus provides method and systems for detailedprediction debugging.

FIG. 19 shows an illustrative visual display 1900 of predictionperformances over a replay group, with query records, according to oneembodiment of the present invention. In this example, when the operatormouse-overs or clicks on an accumulated prediction score point such as1955 of Replay Group 1 on the chart, a floating table 1980 is displayed,showing corresponding query records from Replay Group 1. Query recordsin table 1980 are involved in computing the accumulated prediction scorerepresented by data point 1955.

Window 1982 provides a detailed and zoomed-in view of table 1980. Insome embodiments, window 1982 may be displayed on its own without thefloating table 1980. Label 1984 specifies the time interval andaccumulated prediction score associated with data point 1955, and showsthat query records displayed in this window have been processed throughEngine Variant e_v_111. In this example, query records includeattributes such as Query 1985 (Q), Predicted Result 1986 (P), ActualResult 1987 (A), Query Time 1988 (Time), and Prediction Score 1989(Score). The displayed time interval and engine variant may also be partof the query records. In one specific embodiment, in which no replay IDis utilized, the system may replay based on time or other user definedcondition and display the associated query records. In otherembodiments, dedicated replay IDs may be assigned to each individualquery or individual query record, and may or may not be displayed withother parts of the query records. A scrolling bar 1990 with up and downarrows allows the operator to scroll through query records when notenough space is available to display all query records at the same time.

FIG. 20 shows an illustrative visual display 2000 of predictionperformances over two replay groups, with query records, according toone embodiment of the present invention. When the operator selects aperiod of time on the chart, for example, between time 2042 and 2044, atable 2080 of query records that fall into this time period isdisplayed. Window 2082 is a zoomed-in view of table 2080. Displayed inthis window are query records from Replay Groups 1 and 2, withattributes such as Query (Q), Predicted Results (P), Actual Results (A),Query Time (Time) and Prediction Score (Score).

In some embodiments, the system also provides statistical features tosummarize the prediction performance. For example, the system mayautomatically select queries with outliner scores on the table. Thesystem also provides statistical information such as mean, variance, anddistribution about the scores. In FIG. 20, label 2086 provides the totalnumber of query records and the average accumulated score across thegiven time period between time 2042 and 2044.

Some Exemplary Embodiments for Illustrative Purposes

The languages in the examples or elaborations below are context-specificembodiments, and should not be construed to limit the broader spirit ofthe present invention.

Building machine learning an application from scratch is hard; you needto have the ability to work with your own data and train your algorithmwith it, build a layer to serve the prediction results, manage thedifferent algorithms you are running, their evaluations, deploy yourapplication in production, manage the dependencies with your othertools, etc.

The present invention is a Machine Learning server that addresses theseconcerns. It aims to be the key software stack for data analytics.

Example

Let's take a classic recommender as an example; usually predictivemodeling is based on users' behaviors to predict productrecommendations.

We will convert the data (in Json) into binary Avro format.

// Read training data val trainingData =sc.textFile(“trainingData.txt”).map(_.split(‘,’) match {..})

which yields something like:

user1 purchases product1, product2

user2 purchases product2

Then build a predictive model with an algorithm:

// collaborative filtering algorithm val model = ALS.train(trainingData,10, 20, 0.01)

Then start using the model:

// collaborative filtering algorithm allUsers.foreach { user =>model.recommendProducts(user, 5) }

This recommends 5 products for each user.

This code will work in development environment, but wouldn't work inproduction because of the following problems:

-   -   1. How do you integrate with your existing data?    -   2. How do you unify the data from multiple sources?    -   3. How to deploy a scalable service that responds to dynamic        prediction query?    -   4. How do you persist the predictive model, in a distributed        environment?    -   5. How to make your storage layer, Spark, and the algorithms        talk to each other?    -   6. How to prepare the data for model training?    -   7. How to update the model with new data, without downtime?    -   8. Where does the business logic get added?    -   9. How to make the code configurable, reusable and manageable?    -   10. How do we build these with separation of concern (SOC), like        the web development side of things?    -   11. How to make things work in a real time environment?    -   12. How do I customize the recommender on a per-location basis?        How to discard data that is out of inventory?    -   13. How about performing different tests on the algorithms you        selected?

The Present Invention Solves these Problems

PredictionIO boasts an event server for storage, that collects data(say, from a mobile app, web, etc.) in a unified way, from multiplechannels.

An operator can plug multiple engines within PredictionIO; each enginerepresents a type of prediction problem. Why is that important?

In a production system, you will typically use multiple engines. Forexample, the archetypal example of Amazon: if you bought this, recommendthat. But you may also run a different algorithm on the front page forarticle discovery, and another one for email campaign based on what youbrowsed for retargeting purposes.

PredictionIO does that very well.

How to deploy a predictive model service? In a typical mobile app, theuser behavior data will send user actions. Your prediction model will betrained on these, and the prediction engine will be deployed as a Webservice. So now your mobile app can communicate with the engine via aREST API interface. If this was not sufficient, there are other SDKsavailable in different languages. The engine will return a list ofresults in JSON format.

PredictionIO manages the dependencies of SPARK and HBASE and thealgorithms automatically. You can launch it with a one-line command.

The framework is written in Scala, to take advantage of the JVM supportand is a natural fit for distributed computing. R in comparison is notso easy to scale. Also PredictionIO uses Spark, currently one of thebest-distributed system framework to use, and is proven to scale inproduction. Algorithms are implemented via MLLib. Lastly, events arestore in Apache HBase as the NoSQL storage layer.

Preparing the Data for Model Training

Preparing the data for model training is a matter of running the Eventserver (launched via (‘pio eventserver’) and interacting with it, bydefining the action (i.e. change the product price), product (i.e. givea rating A for product x), product name, attribute name, all in freeformat.

Building the engine is made easy because PredictionIO offers templatesfor recommendation and classification. The engine is built on an MVCarchitecture, and has the following components:

-   -   1. Data source: data comes from any data source, and is        preprocessed automatically into the desired format. Data is        prepared and cleansed according to what the engine expects. This        follows the Separation of Concerns concept.    -   2. Algorithms: machine learning algorithms at your disposal to        do what you need; ability to combine multiple algorithms.    -   3. Serving layer: ability to serve results based on predictions,        and add custom business logic to them.    -   4. Evaluator layer: ability to evaluate the performance of the        prediction to compare algorithms.

Live Evaluation

PredictionIO Enterprise Edition is capable of performing live evaluationof its prediction performance. This is a lot more accurate because it iscapable of tracking all subsequent actions of a user after a predictionhas been presented to the user.

Architecture

PredictionIO has two types of deployable servers: event server andprediction engine server. In live evaluation mode, a prediction engineserver will do the following additional actions per query:

-   -   generates a unique tracking tag for the current query;    -   logs the current query, predictions of the current query, and        the unique tracking tag; and    -   presents predictions and the unique tracking tag to the user.

Subsequent actions of the user will be logged and tracked using theaforementioned unique tracking tag. This is called the “tracking data.”

Replay Loop

Utilizing the above features, the present inventors built on top of it areplay loop to perform live evaluation of prediction engines withunmatched accuracy and level of details that otherwise A/B testing, oroffline evaluation would not be able to provide.

PredictionIO Enterprise Edition provides a special data source (datareader) that can use the “tracking data” to replay how a predictionengine performs. This data source is able to reconstruct the completehistory of each user that queried the system.

PredictionIO Enterprise Edition provides a special evaluator componentthat takes the complete history of each user and produce accurate anddetailed reports of how each prediction performed. Besides obtaining abetter picture of how the prediction engine performs in contrast toblack box A/B tests, this level of detail enables fine tuning of theprediction engine by data scientists and engine developers.

Visual Replay

Visual Replay is allowed for replay loops, providing more information tothe operators.

Summary

The present invention helps data scientists and developers develop anddeploy machine learning systems.

One embodiment provides a library/engine templates gallery so developerscan build their own engines or customize templates to their own needs;ready to use right away and also customizable. All engines follow thesame DASE architecture described above.

Engines are deployed as a web service, which are deployed as a service.Unifying data for predictive analytics—provide an event server to trainthe data. Event server can connect to existing systems, like mailservers for example. Can be installed on premises. Can also be deployedon AWS or private cloud. Because of customizability, makes sense forusers to install on their own cloud.

Some Illustrative Benefits of the Present Invention

These benefits are illustrative of some advantages of the presentinvention over the prior art, and are not to be read as limiting, or tolimit the benefits of the present invention to those listed. Otherbenefits may also exist.

1) Differentiation between engine and algorithm

-   -   a. Focus on engine, not algorithm. When doing evaluation, not        just evaluating the algorithm, also evaluating the data sources.        And business logic parameters.    -   b. Engine level comparison, versus algorithm parameter tuning        based on algorithm.    -   c. Not just tuning parameters of an algorithm, versus parameters        of an engine.    -   d. Engine parameter takes into account business logic, not just        prediction accuracy of a single algorithm.    -   e. Can deploy multiple variants of engines, with different        algorithms.    -   f. The variants are chosen by the user, based on a template        provided by PredictionIO, and may also be automatically        generated.    -   g. The template gives the engine parameters that the user can        tune with the default setting. The parameter generator deploys        the variants. For example, engine .json contains a list of        parameters that an operator can tune.

2) Time Horizon

-   -   a. Time horizon on replay is much different from advertising        real-time.    -   b. All lifecycle is done in prediction.    -   c. Real-time environment.    -   d. In the replay, it can take into account a longer time horizon        of user actions.

3) User response versus any event, such as immediate events, or delayed,or multiple.

-   -   a. When did the user click? Might not purchase or click, but can        keep track of how the user behaves, and all of the actions the        user does on the page.    -   b. Sequence of actions—for example, user might not click on 5        products, but buy a product later.

4) Query is generic

-   -   a. Predicted results is generic, versus in advertisement, which        is specific.    -   b. Tracking can track how good the predictive result is.    -   c. Actual consequence or conversion doesn't necessarily matter.

5) Replay

-   -   a. Replay means the whole situation is replayed. Not simply is        the result positive or negative, but what will users do with the        predictions?    -   b. Replay serves the purpose of a debugger of engine        performance.    -   c. Problem is in AB testing scenario, one can only tell if        variant 1 performs better than variant 2. But in        debugger/replay, why does variant 1 do better than variant 2 can        be answered and determined by the operators. For example, the        operator can replay a scenario, and understand the behavior of        that particular engine variant.    -   d. Can replay why the engine is giving a bad, or a good        recommendation, and then find out why.

6) Replay advantages

-   -   a. Visual elements in visual replay are graphical and/or        textual, giving more insight.    -   b. User interactions.    -   c. How to tune the engine? Algorithm?    -   d. Evaluation and tuning.    -   e. Can change the scenario based on replay results. For example,        can change the email header, and replay how the results would        perform for that engine variant.    -   f. Support both types of predictions—off-line and live        evaluation. Both are off-line in one sense, but one kind        (off-line) can be simulated. One type (live evaluation) effects        causality. In one type of prediction (live evaluation), you show        something to the user, which affects the outcome of the user.        The other type of prediction (off-line), doesn't affect the        user.

CONCLUSIONS

One of ordinary skill in the art knows that the use cases, structures,schematics, and flow diagrams may be performed in other orders orcombinations, but the inventive concept of the present invention remainswithout departing from the broader spirit of the invention. Everyembodiment may be unique, and methods/steps may be either shortened orlengthened, overlapped with the other activities, postponed, delayed,and continued after a time gap, such that every user is accommodated topractice the methods of the present invention.

The present invention may be implemented in hardware and/or in software.Many components of the system, for example, network interfaces etc.,have not been shown, so as not to obscure the present invention.However, one of ordinary skill in the art would appreciate that thesystem necessarily includes these components. A user-device is ahardware that includes at least one processor coupled to a memory. Theprocessor may represent one or more processors (e.g., microprocessors),and the memory may represent random access memory (RAM) devicescomprising a main storage of the hardware, as well as any supplementallevels of memory e.g., cache memories, non-volatile or back-up memories(e.g. programmable or flash memories), read-only memories, etc. Inaddition, the memory may be considered to include memory storagephysically located elsewhere in the hardware, e.g. any cache memory inthe processor, as well as any storage capacity used as a virtual memory,e.g., as stored on a mass storage device.

The hardware of a user-device also typically receives a number of inputsand outputs for communicating information externally. For interface witha user, the hardware may include one or more user input devices (e.g., akeyboard, a mouse, a scanner, a microphone, a web camera, etc.) and adisplay (e.g., a Liquid Crystal Display (LCD) panel). For additionalstorage, the hardware my also include one or more mass storage devices,e.g., a floppy or other removable disk drive, a hard disk drive, aDirect Access Storage Device (DASD), an optical drive (e.g. a CompactDisk (CD) drive, a Digital Versatile Disk (DVD) drive, etc.) and/or atape drive, among others. Furthermore, the hardware may include aninterface with one or more networks (e.g., a local area network (LAN), awide area network (WAN), a wireless network, and/or the Internet amongothers) to permit the communication of information with other computerscoupled to the networks. It should be appreciated that the hardwaretypically includes suitable analog and/or digital interfaces tocommunicate with each other.

In some embodiments of the present invention, the entire system can beimplemented and offered to the end-users and operators over theInternet, in a so-called cloud implementation. No local installation ofsoftware or hardware would be needed, and the end-users and operatorswould be allowed access to the systems of the present invention directlyover the Internet, using either a web browser or similar software on aclient, which client could be a desktop, laptop, mobile device, and soon. This eliminates any need for custom software installation on theclient side and increases the flexibility of delivery of the service(software-as-a-service), and increases user satisfaction and ease ofuse. Various business models, revenue models, and delivery mechanismsfor the present invention are envisioned, and are all to be consideredwithin the scope of the present invention.

The hardware operates under the control of an operating system, andexecutes various computer software applications, components, programs,codes, libraries, objects, modules, etc. indicated collectively byreference numerals to perform the methods, processes, and techniquesdescribed above.

In general, the method executed to implement the embodiments of theinvention, may be implemented as part of an operating system or aspecific application, component, program, object, module or sequence ofinstructions referred to as “computer program(s)” or “computer code(s).”The computer programs typically comprise one or more instructions set atvarious times in various memory and storage devices in a computer, andthat, when read and executed by one or more processors in a computer,cause the computer to perform operations necessary to execute elementsinvolving the various aspects of the invention. Moreover, while theinvention has been described in the context of fully functioningcomputers and computer systems, those skilled in the art will appreciatethat the various embodiments of the invention are capable of beingdistributed as a program product in a variety of forms, and that theinvention applies equally regardless of the particular type of machineor computer-readable media used to actually effect the distribution.Examples of computer-readable media include but are not limited torecordable type media such as volatile and non-volatile memory devices,floppy and other removable disks, hard disk drives, optical disks (e.g.,Compact Disk Read-Only Memory (CD ROMS), Digital Versatile Disks,(DVDs), etc.), and digital and analog communication media.

Although the present invention has been described with reference tospecific exemplary embodiments, it will be evident that the variousmodification and changes can be made to these embodiments withoutdeparting from the broader spirit of the invention. Accordingly, thespecification and drawings are to be regarded in an illustrative senserather than in a restrictive sense. It will also be apparent to theskilled artisan that the embodiments described above are specificexamples of a single broader invention which may have greater scope thanany of the singular descriptions taught. There may be many alterationsmade in the descriptions without departing from the spirit and scope ofthe present invention.

What is claimed is:
 1. A system for tracking a predictive engine forreplay of engine performance, comprising: a processor; an engine variantof the predictive engine stored in a digital working memory, wherein theengine variant is determined by an engine parameter set, and wherein theengine parameter set identifies at least one data source and at leastone algorithm; and a non-transitory, computer-readable storage mediumfor storing program code, the program code when executed by theprocessor, causes the processor to: deploy the engine variant of thepredictive engine based on the engine parameter set; receive one or morequeries to the deployed engine variant from one or more end-userdevices; in response to the queries, the deployed engine variantgenerates one or more predicted results; receive one or more actualresults corresponding to the predicted results; and associate thequeries, the predicted results, and the actual results with a replaytag, and record the queries, the predicted results, and the actualresults with the corresponding deployed engine variant.
 2. The system ofclaim 1, wherein the program code when executed by the processor,further causes the processor to: receive a replay request specified byone or more replay tags; and in response to the replay request, replayat least one item selected from the group consisting of the queries, thepredicted results, and the actual results associated with the one ormore replay tags.
 3. The system of claim 1, wherein the engine parameterset is generated manually by an operator.
 4. The system of claim 1,wherein the engine parameter set is determined automatically.
 5. Thesystem of claim 1, wherein the actual results comprise a sequence ofuser responses.
 6. The system of claim 1, wherein the actual resultscomprise a sequence of user responses collected over a delayed timeframe.
 7. The system of claim 1, wherein the actual results comprise asequence of user responses recorded from at least one cohort of users.8. The system of claim 1, wherein the actual results are received from adatastore.
 9. The system of claim 1, wherein the actual results aresimulated.
 10. A method of tracking a predictive engine for replay ofengine performance, comprising: deploying an engine variant of thepredictive engine based on an engine parameter set, wherein the engineparameter set identifies at least one data source and at least onealgorithm; receiving one or more queries from one or more end-userdevices; in response to the queries, the deployed engine variantgenerating one or more predicted results; receiving one or more actualresults corresponding to the predicted results; and associating thequeries, the predicted results, and the actual results with a replaytag, and recording the queries, the predicted results, and the actualresults with the corresponding deployed engine variant;
 11. The methodof claim 10, further comprising: receiving a replay request specified byone or more replay tags; and in response to the replay request,replaying at least one item selected from the group consisting of thequeries, the predicted results, and the actual results associated withthe one or more replay tags.
 12. The method of claim 10, wherein theengine parameter set is generated manually by an operator.
 13. Themethod of claim 10, wherein the engine parameter set is determinedautomatically.
 14. The method of claim 10, wherein the actual resultscomprise a sequence of user responses.
 15. The method of claim 10,wherein the actual results comprise a sequence of user responsescollected over a delayed time frame.
 16. The method of claim 10, whereinthe actual results comprise a sequence of user responses recorded fromat least one cohort of users.
 17. The method of claim 10, wherein theactual results are received from a datastore.
 18. The method of claim10, wherein the actual results are simulated.
 19. A non-transitorycomputer-readable storage medium for tracking a predictive engine forreplay of engine performance, the storage medium comprising program codestored thereon, that when executed by a processor, causes the processorto: deploy an engine variant of the predictive engine based on an engineparameter set, wherein the engine parameter set identifies at least onedata source and at least one algorithm; receive one or more queries fromone or more end-user devices; in response to the queries, the deployedengine variant generates one or more predicted results; receive one ormore actual results corresponding to the predicted results; andassociate the queries, the predicted results, and the actual resultswith a replay tag, and record the queries, the predicted results, andthe actual results with the corresponding deployed engine variant. 20.The non-transitory computer-readable storage medium of claim 19, whereinthe program code when executed by the processor, further causes theprocessor to: receive a replay request specified by one or more replaytags; and in response to the replay request, replay at least one itemselected from the group consisting of the queries, the predictedresults, and the actual results associated with the one or more replaytags.